Multiple-Translation Spotting for Mandarin-Taiwanese Speech-to-Speech Translation

نویسندگان

  • Jhing-Fa Wang
  • Shun-Chieh Lin
  • Hsueh-Wei Yang
  • Fan-Min Li
چکیده

The critical issues involved in speech-to-speech translation are obtaining proper source segments and synthesizing accurate target speech. Therefore, this article develops a novel multiple-translation spotting method to deal with these issues efficiently. Term multiple-translation spotting refers to the task of extracting target-language synthesis patterns that correspond to a given set of source-language spotted patterns in conditional multiple pairs of speech patterns known to be translation patterns. According to the extracted synthesis patterns, the target speech can be properly synthesized by using a waveform segment concatenation-based synthesis method. Experiments were conducted with the languages of Mandarin and Taiwanese. The results reveal that the proposed approach can achieve translation understanding rates of 80% and 76% on average for Mandarin/Taiwanese translation and Taiwanese/Mandarin translation, respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Mandarin to Taiwanese Min Nan Machine Translation System with Speech Synthesis of Taiwanese Min Nan

This paper presents a design of a Mandarin to Taiwanese Min Nan (abbreviated as Taiwanese hereafter) machine translation system. It is the first machine translation system which focuses on these two languages. An input Mandarin sentence is segmented, tagged and translated word by word according to the part of speech of each word. The candidates come from a Mandarin-Taiwanese dictionary. If more...

متن کامل

利用統計方法及中文訓練資料處理台語文詞性標記 (Modeling Taiwanese POS tagging with statistical methods and Mandarin training data) [In Chinese]

In this paper, we propose a POS tagging method using more than 60 thousand entries of Taiwanese-Mandarin translation dictionary and 10 million words of Mandarin training data to tag Taiwanese. The literary written Taiwanese corpora have both Romanization script and Han-Romanization mixed script, the genre includes prose, fiction and drama. We follow tagset drawn up by CKIP. We develop word alig...

متن کامل

The Effect of Private Speech and Self-Regulation on Translation Quality among Iranian Translation Students: A Mixed-Methods Study

The current study presents findings from a mixed-methods study of investigating the self-regulatory role of private speech (self-talk) on students’ translation quality. The aim of the study was to validate the adapted version of a self-verbalization questionnaire. The construct validity and reliability of the scale were supported by the CFA which revealed that all items reached the acceptable f...

متن کامل

Mandarin-English Information (MEI)

Mandarin-English Information (MEI) is one of the four projects selected for the Johns Hopkins University Summer Workshop 2000. We plan to develop technologies for using written queries to search spoken documents (cross-media) between English and Mandarin Chinese (cross-language). Our research focus is on the integration of speech recognition and machine translation technologies in the context o...

متن کامل

Toward Constructing A Multilingual Speech Corpus for Taiwanese (Min-nan), Hakka, and Mandarin Chinese

The Formosa speech database (ForSDat) is a multilingual speech corpus collected at Chang Gung University and sponsored by the National Science Council of Taiwan. It is expected that a multilingual speech corpus will be collected, covering the three most frequently used languages in Taiwan: Taiwanese (Min-nan), Hakka, and Mandarin. This 3-year project has the goal of collecting a phonetically ab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJCLCLP

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2004